magiclanternfandomcom-20200223-history
GPL Tools/ARM console
ARM firmware analysis console This will contain all my firmware analysis scripts which are now floating around. Download Git repo https://github.com/alexdu/ARM-console: git clone git://github.com/alexdu/ARM-console.git Zip: wget http://github.com/alexdu/ARM-console/zipball/master Preparing to run Requirements * Python (I use 2.6 under Linux, but it should run under any major operating system) * some Python libraries: sudo apt-get install python python-dev python-scipy python-tk python-profiler graphviz libpng12-dev ipython 0.10. Latest version is not compatible. sudo apt-get install python-setuptools python-matplotlib sudo easy_install pydot easygui cheetah ahocorasick profilestats sympy 0.6.7. Sympy 0.7.1 causes problems when decompiling. * arm-elf-gcc in your PATH (see Build instructions/550D for how to do that) * IPython version 0.11 and 0.12 are not compatible current ARM-console, downgrade to 0.10.2. * at least 4 GB of RAM (or skills to optimize the script). (This is the reason my scripts run 10-100 times faster than in IDAPython: because I've cached lots of stuff in Python dictionaries.) Step by step setting up on OS X 10.6.7 (by coutts). *** Confirmed working on 10.6 & 10.7 *** Input and source files Prepare a working directory where you will put the input files. You will need: * Some dumps, with the .bin extension. Include the load address in the dump name. * Some databases, in IDC or Stubs (*.S) format. Try to give them names similar to the dumps, to help the autodetection. * Unzip the scripts in the same folder Example of contents of the working folder: scripts '' main.py README.md autoexec.0x8A000.bin http://groups.google.com/group/ml-devel/msg/1c690d8dee580ee3 5d2.204.0xff810000.bin 550d.108.0xff010000.bin 550d.109.0xff010000.bin 5d2.204.AJ.idc 550d.108.20101116_indy_ROM0.idc autoexec.S stubs-5d2.204.S http://bitbucket.org/hudson/magic-lantern/src/tip/stubs-5d2.204.S stubs-550d.108.S http://bitbucket.org/hudson/magic-lantern/src/tip/stubs-550d.108.S Running in interactive mode Start the program with: python main.py and you should get this prompt: ARM firmware analysis console ready. In 1: This is the IPython prompt; here you can browse the dumps, find/verify matches between firmware versions, and lots of other cool stuff. If you are new to IPython, be sure to skim this tutorial: http://ipython.scipy.org/doc/nightly/html/interactive/tutorial.html Hex numbers Python uses decimal format by default. If you know how to change it to hex for integers, please leave a message. Until then, you'll have to use these: In 1: '''hex(100)' Out1: 64 In 2: hex(-1) Out2: FFFFFFFF In 3: int("babe", 16) Out3: 47806 Loading the dumps You can select the dumps to load with a regex: In 4: D = load_dumps("(108|204|autoexec)") Binary dump (*.bin) LoadAddr IDC database (*.idc) 550d.108.0xff010000.bin FF010000 550d.108.20101116_indy_ROM0.idc autoexec.0x8A000.bin 8A000 n/a 5d2.204.0xff810000.bin FF810000 5d2.204.AJ.idc ... In 5: D''' Out5: of 550d.108.0xff010000.bin, Dump of 5d2.204.0xff810000.bin, Dump of autoexec.0x8A000.bin You will want to assign a short name for each dump. Hint: they are sorted after the bin's file name. In 6: '''t2i, mk2, ml = D Or,, If you load only single Binaly file. In 6: ml = D0 The script will auto-detect IDC files with similar filenames, and load some info from them. You'll have to load stubs (*.S) files manually: In 7: ml.load_names("stubs-550d.108.S") Found 80 stubs in stubs-550d.108.S. In 8: ml.load_names("autoexec.S") Found 8 stubs in autoexec.S. Automatic function guess This will try to find function calls and identify functions inside the firmware. Experimental, but should be harmless. I prefer to run it before generating the HTML. In 9: guessfunc.run(ml) Browsing the firmware: HTML Run this to create a browseable HTML like this example: In 10: html.quick(ml) In 11: html.quick(t2i) and when it's ready, open index.html in a webkit-based browser (firefox is too slow, sorry!) If you want a more thorough analysis of the firmware, like this one, run: In 12: html.full(ml) A full analysis of ML firmware takes 1-2 minutes. The same analyses for the 550D firmware takes around 1 day, or less if you help me optimize the algorithms :) If you can leave the computer on for a week, just run this to analyze all your dumps: In 13: html.full(D) !!! THE HTML FILES WILL CONTAIN CANON COPYRIGHTED MATERIAL !!! !!! DO NOT SHARE THEM WITH ANYONE !!! (see the FAQ for details: http://magiclantern.wikia.com/wiki/FAQ#How_do_I_get_a_ROM0.bin_firmware_image.3F) Of course, if you disassemble the Magic Lantern firmware (autoexec.bin), no Canon code will be in the output files. Browsing the firmware: plain text If you prefer to browse the disassembly in your favorite text editor, just export the disassembly to a file: In 14: t2i.save_disasm("550d.108.dis") Saving disassembly to 550d.108.dis... The format is somewhat similar to the one obtained with disassemble.pl from CHDK (it uses objcopy/objdump). Main advantage over HTML: easy full-text search. Browsing the firmware: IPython console First, select a dump: In 15: sel t2i For quick browsing, use the g''' magic command, which works somewhat like the G key in IDA: In 16: '''g 0xff053490+40 ff0534b8: e3a05000 mov r5, #0 ; 0x0 ff0534bc: e5940058 ldr r0, #88 ff0534c0: e3a01000 mov r1, #0 ; 0x0 ff0534c4: eb005a69 bl @TakeSemaphore ff0534c8: e3100001 tst r0, #1 ; 0x1 ff0534cc: 159f2108 ldrne r2, #264 ; 0xff0535dc: pointer to 0x4ce ff0534d0: 128f10dc addne r1, pc, #220 ; *'SoundDevice\\SoundDevice_CODEC.c' ff0534d4: 128f0f41 addne r0, pc, #260 ; *'!IS_ERROR( TakeSemaphore( m_hSemTask, FOREVER ))' ff0534d8: 1bff00cf blne @assert_0 ff0534dc: e1d400d0 ldrsb r0, r4 In 17: g DebugMsg // Start of function: DebugMsg NSTUB(DebugMsg, ff0673ec): ff0673ec: e92d000f push {r0, r1, r2, r3} ff0673f0: e92d41f0 push {r4, r5, r6, r7, r8, lr} ff0673f4: e59f812c ldr r8, #300 ; 0xff067528: pointer to 0x2b6c ff0673f8: e1a04001 mov r4, r1 ff0673fc: e5981000 ldr r1, r8 ff067400: e24dd088 sub sp, sp, #136 ; 0x88 ff067404: e3510000 cmp r1, #0 ; 0x0 ff067408: 135000ff cmpne r0, #255 ; 0xff ff06740c: 1591200c ldrne r2, #12 ff067410: 13520000 cmpne r2, #0 ; 0x0 You may search for strings using a regex: In 18: s purple finding strings... ff56c718: 'purple' ff56c5bc: 'mediumpurple' In 19: s (mvr|set).*filter ff541dd0: '***** DlgMnPictureStyleDetail.c SetDataToStorage IDC_DPM_FILTER(%d)' ff541ee0: '***** DlgMnPictureUserDetail.c SetDataToStorage IDC_DPM_FILTER(%d)' ff064cbc: 'SetFilterRec' ff064f98: 'SetFilterOff' ff1aa558: 'mvrSetDeblockingFilter (alpha = %d, beta = %d)' ff1aab24: 'mvrSetDefDBFilter (A = %d, B = %d)' ff1aabd4: 'mvrSetDeblockingFilter' ff1aacd8: 'mvrSetDefDBFilter' Or search for references to some names / values: In 20: r additional_version GUI_GetFirmVersion+24: ff20cbc8: e59f019c ldr r0, #412 ; 0xff20cd6c: pointer to 0x15094 (additional_version) 0x15094 (additional_version) GUI_GetFirmVersion+72: ff20cbf8: e59f116c ldr r1, #364 ; 0xff20cd6c: pointer to 0x15094 (additional_version) 0x15094 (additional_version) sub_FF1FBA24+14344: ff1ff22c: 159f0118 ldrne r0, #280 ; 0xff1ff34c: pointer to 0x15094 (additional_version) 0x15094 (additional_version) In 21: r 0x1234 sdSetRelativeAddress+28: ff3f229c: e59f01f0 ldr r0, #496 ; 0xff3f2494: pointer to 0x1234 0x1234 From now on, TAB completion and quick help are your friends: t2i.''' funcs refs strings strrefs ...etc... In 22: '''t2i? Base Class: scripts.disasm.Dump Docstring: Contains all the info about a dump: ROM contents, disassembly, references, function names... ... Most of functions which output lots of text (like disasm, strings, refs) can display their output in a codebox (from easygui). To enable that, just pass gui=1 as the last argument: In 23: t2i.refs("sounddev", gui=1) If you want the gui boxes enabled by default, edit disasm.py (you'll find the setting there). Annotating addresses in the firmware You can use some functions whose names are inspired from IDAPython / IDC: In 24: t2i.MakeName(0xFF06AFC0, "MEM_GetSizeOfMaxRegion") To delete a name, just say None (or empty string "") instead of name: In 25: t2i.MakeName(0x4, None) Deleting name 4 -> GUI_GetMWBCaption To create a function, you can specify the start address and let it guess the end address: In 26: t2i.MakeFunction(0xFF06A0F4) Size: 72 Of course, you can specify both the start and end addresses: In 27: t2i.MakeFunction(0xFF06A0F4, FF06A138) Right now, things may go wrong if you try to remove a function or to change an existing one, so... don't! After you annotate some addresses in the firmware, you may want to see the new names in the HTML version. Just run: In 28: html.update(t2i) It will (try to) update only the files which reference the newly annotated addresses. Loading and saving names If you want to load some names from another file, other than the auto-guessed ones, use this: In 29: t2i.load_names("stubs-550d.108.S") Found 80 stubs in stubs-550d.108.S. Overwriting name prop_cleanup Overwriting name free ... You can also pass an IDC file (it's autodetected). If you want to export your names, use: In 30: t2i.save_names("mynames.S") Saved 56800 names out of 56800. In 31: t2i.save_names("mynames.idc") Saved 56800 names out of 56800. Deleted 1 names. What if you want to export only your changes? No problem: In 32: t2i.save_new_names("mychanges.S") Saved 1 names out of 56800. In 33: t2i.save_new_names("mychanges.idc") Saved 1 names out of 56800. Deleted 1 names. In 34: cat mychanges.S #include static main() { MakeName(0xFF06AFC0, MEM_GetSizeOfMaxRegion) MakeName(0x4, "") } In 35: cat mychanges.idc NSTUB(0xFF06AFC0, MEM_GetSizeOfMaxRegion) It will save only the names which were not loaded from a file. Deleted names are only saved in IDC format. Functions can't be exported yet, so for now it's better to use IDA for this. The demo version of IDA can import/export IDC files. Matching functions and addresses between different firmware versions See GPL Tools/match.py. NumPy'ing the firmware If you like the idea of doing numerical analysis on camera's firmware, then this may be for you. If you already know Matlab or Octave, take a look here: http://www.scipy.org/NumPy_for_Matlab_Users Let's try a histogram of the values referenced in the code: In 36: r = array([a1 for a in t2i.REFLIST]) In 37: hist(r, 100) In 38: show() There are two big peaks, and we can't see what's besides them. Let's try a log hist: In 39: cla() In 40: hist(r, 100, log=1) Let's zoom in a bit: In 41: slice = r& (r < 10000) In 42: cla() In 43: hist(slice, 100) There are some peaks: they seem to be at 1024, 2048, 4096 and 8192 (since those are round numbers). Let's look at them: In 44: b = bincount(slice.astype(int32)) In 45: o = argsort(-b) In 46: o:10 Out46: array(1024, 8192, 4096, 6464, 2112, 1104, 2080, 1280, 1776) The next peak after those round numbers is 6464=0x1940. What could this be? In 47: t2i.refs(6464, gui=1) Want to see more lines before and after each reference? In 48: t2i.refs(6464, context=5, gui=1) So if you can figure out from this what 0x1940 is, you are a genius! Running in non-interactive mode Don't like the interactive mode? Start from "main.py" and create your own scripts. For example: from scripts import * D = load_dumps() print D Save it as myscript.py and run it like a normal Python script: /home/user$ python myscript.py Hint: when debugging, try to test your script with a smaller dump, like autoexec.bin. API Reference Don't miss this if you really want to use the script :) What not to do * Do not publish files which contain copyrighted code! (from Canon or from any other third party). If you do, you'll cause lots of trouble to the Magic Lantern community. * Do not load too many dumps at once! The script is VERY memory hungry, and IT CAN CRASH LINUX IN SECONDS!!! If the system starts swapping, you'll have to reboot your machine! Or disable the swap (like I did), and instead, the script (or other memory-hungry program) will be killed when it asks for too much memory. * Do not change the working directory! The scripts use relative paths and won't find their required files. ---- Enjoy! --Alexdu 18:48, December 1, 2010 (UTC)